Goto

Collaborating Authors

 human eye perceptual evaluation


HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time. We introduce two variants: one that measures visual perception under adaptive time constraints to determine the threshold at which a model's outputs appear real (e.g.


HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.


Reviews: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

This paper introduces a framework to evaluate the perceptual realism of samples from generative models. The framework, HYPE- Human Eye Perceptual Evaluation, is based on psychophysics methods. Two different metrics are proposed. The first one, HYPE_time, measures the amount of time a human needs before distinguishing a real from a fake. The metric is clearly defined and very well founded on psychophysics.


Reviews: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

The reviewers were unanimous in judging that this is good quality work that tackles a an important and relevant problem for NeurIPS, and that it will attract attention of a wide audience. The rebuttal solidified this viewpoint in the discussions thereafter. Given the enthusiastic reviews, I think this deserves an oral presentation at NeurIPS.


HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.


HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Zhou, Sharon, Gordon, Mitchell, Krishna, Ranjay, Narcomey, Austin, Fei-Fei, Li F., Bernstein, Michael

Neural Information Processing Systems

Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.